Introducing a very large dataset of handwritten Farsi digits and a study on their varieties

نویسندگان

  • Hossein Khosravi
  • Ehsanollah Kabir
چکیده

A very large dataset of handwritten Farsi digits is introduced. Binary images of 102,352 digits were extracted from about 12,000 registration forms of two types, filled by B.Sc. and senior high school students. These forms were scanned at 200 dpi with a high speed scanner. A method for finding variety of handwritten digits in a typical dataset is proposed. Based on this method, training and test subsets are provided to facilitate sharing of results among researchers as well as performance comparison. 2007 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The biologically inspired Hierarchical Temporal Memory

It is herein proposed a handwritten digit recognition system which biologically inspired of the large-scale structure of the mammalian neocortex. Hierarchical Temporal Memory (HTM) is a memory-prediction network model that takes advantage of the Bayesian belief propagation and revision techniques. In this article a study has been conducted to train a HTM network to recognize handwritten digits ...

متن کامل

Persian Handwritten Digit Recognition Using Particle Swarm Probabilistic Neural Network

Handwritten digit recognition can be categorized as a classification problem. Probabilistic Neural Network (PNN) is one of the most effective and useful classifiers, which works based on Bayesian rule. In this paper, in order to recognize Persian (Farsi) handwritten digit recognition, a combination of intelligent clustering method and PNN has been utilized. Hoda database, which includes 80000 P...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

A Study on Farsi Handwriting Styles for Online Recognition

Knowing varieties of writing a letter in a word or a subword in different handwriting styles is very beneficial in recognition specifically for online recognition. In this paper, TMU-OFS dataset consisting of 1000 frequent Farsi subwords is employed to study Farsi handwriting styles. The subwords are grouped based on their delayed strokes and their main bodies, separately. The handwriting style...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2007